Skip to content

[codex] Prototype Codex Apps as virtual HTTP MCP servers#30000

Draft
aibrahim-oai wants to merge 1 commit into
mainfrom
codex/apps-virtual-mcp-prototype
Draft

[codex] Prototype Codex Apps as virtual HTTP MCP servers#30000
aibrahim-oai wants to merge 1 commit into
mainfrom
codex/apps-virtual-mcp-prototype

Conversation

@aibrahim-oai

@aibrahim-oai aibrahim-oai commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Important

This is an architecture prototype, not a landing-sized change. The review ask is: is this the right ownership boundary, and did the prototype preserve every behavior worth preserving at the right layer? If the answer is yes, the final section proposes a dependency-ordered landing stack.

Problem

Hosted Codex Apps already speak MCP, but Codex does not currently treat them like ordinary MCP servers.

Today they enter the client as one reserved codex_apps server. Generic MCP and core code then unpack connector annotations from that server's tool list and reconstruct product concepts that MCP should not need to understand:

  • codex-mcp and the connection manager recognize the reserved server, parse connector identity, create connector-specific namespaces, and carry Apps-specific cache, reconnect, auth, approval, file, and refresh behavior.
  • core imports connector types, derives app inventory from MCP tools, renders Apps instructions, handles Apps-specific approval and file paths, tracks connector analytics/presentation, and hard-refreshes Apps after installs.
  • app-server has to construct or query generic MCP machinery when it only wants product-level Apps inventory or auth state.

That inversion makes every Apps behavior a cross-cutting change. It also creates two lifecycle paths for what is fundamentally the same thing: an MCP server with tools, resources, authentication, approvals, and a connection lifecycle.

The target invariant is:

core, codex-mcp, the MCP connection manager, and mcp-server see only ordinary MCP server registrations. They contain no direct Apps/connector dependencies, reserved-name checks, or product-specific branches.

One compatibility seam remains in protocol serialization: generic McpToolSource accessors still serialize the historical connector_id/connector_name/connector_description fields consumed by Guardian and rollout clients. The prototype does not branch on those fields in generic layers, but the boundary verifier cannot prove that semantic distinction.

Product-aware hosts may still use the shared Apps service directly. In particular, app-server should list Apps and read auth/install state from the Apps owner rather than reverse-engineering them from generic MCP tool state. Runtime Apps authentication remains an Apps-owned elicitation flow.

Review priorities, in order

These priorities are ordered. A lower item should not compromise a higher one.

  1. Enforce the ownership boundary. No direct Apps/connector dependencies, reserved-name checks, or product-specific branches in protected generic layers.
  2. Preserve trust and authorization boundaries. Runtime credentials, auth revision, approval identity, private metadata, file permissions, and resource access cannot become weaker to simplify the move.
  3. Preserve user-visible behavior at its natural owner. Namespaces, policy, approvals, elicitations, files, resources, app listing, plugin attribution, analytics, and presentation should continue unless this PR names the change explicitly.
  4. Keep one canonical path from the generic MCP boundary inward. After the Apps adapter exposes loopback servers, they use the same catalog, HTTP client, reconciliation, tool-call, resource, and shutdown machinery as any other MCP server.
  5. Make publication consistent and startup non-blocking. A turn sees one immutable Apps generation; cache/retry/discovery cannot block the model boundary or expose a mixed inventory.
  6. Add only reusable generic mechanisms. A runtime-owned MCP server needs credentials, lifetime ownership, trusted metadata, deterministic precedence, and revisioned contribution. None of those abstractions should mention Apps.
  7. Land in independently green slices. The prototype can be large; the landing sequence should not leave duplicate production paths or an unwired crate between stages.

Intentional or stricter behavior changes

These are not described as code motion. They need explicit reviewer agreement before landing.

  1. Cold discovery no longer blocks the first model request. With no cached/published generation, the first cold turn can proceed without Apps tools or Apps instructions. Background discovery publishes a generation for a later safe boundary. This removes startup latency and failure coupling, but it is a visible availability tradeoff.

  2. The current prototype exposes the real MCP topology through app-server. mcpServerStatus/list changes from one codex_apps entry with grouped tools to a resource-only codex_apps entry plus one codex_apps__<connector> entry per materialized connector namespace. This is not required by the core/MCP ownership boundary because app-server is allowed to understand Apps; preserving the aggregate public shape is an open landing decision. Connector resource reads already remain restricted to declared URIs.

  3. Generic OAuth does not claim runtime Apps servers. mcpServer/oauth/login rejects these runtime-only registrations. Apps auth remains the private auth-elicitation/install flow owned by the Apps adapter and extension.

  4. The current prototype makes generic install suggestion tools plugin-only. Standalone connectors are no longer returned by list_available_plugins_to_install or accepted by request_plugin_install. This also means codex-tui has no model-driven install candidate when only standalone connectors are available. The architecture does not require that regression: if standalone model-driven installation remains required, it should move behind an Apps-owned extension surface rather than disappear or return as a core branch.

  5. Plugin and Apps guidance follows callable MCP namespaces. Model-facing plugin instructions no longer render a separate “Apps from this plugin” list; declared Apps appear through their contributed MCP namespaces and attribution. Apps instructions retain include_apps_instructions but now require at least one policy-enabled tool in the published snapshot, avoiding instructions for an App with nothing callable.

  6. Explicit/implicit Apps analytics become turn-local. The previous session-global connector selection made every later use look explicit. The extension now classifies explicit use from raw mentions in the current turn. This fixes sticky classification, but an app mention introduced by a skill no longer counts as an explicit user mention.

  7. Malformed inventory is handled more strictly. Blank tool names and incomplete connector identity are omitted; inconsistent display names for one stable connector ID reject the generation. Synthetic-only connectors remain available for auth/install checks but do not become installed Apps or callable connector servers.

  8. Legacy provenance-free Apps caches are not migrated. Old entries cannot prove their upstream URL or SKU. Ignoring them avoids cross-origin/account reuse, but the first run after upgrade may require live discovery.

  9. File handling is stricter. The old path could read from the primary environment without exact sandbox context. The new path requires the pinned environment generation, rejects stale instances and malformed arrays, and checks actual buffered size. Cancellation prevents partial argument rewriting or an upstream invocation, but does not roll back uploads that already completed.

  10. Collision precedence is explicit. An enabled configured server loses to a generated Apps registration with the same connector name. An explicitly disabled same-name server remains a veto for that registration; a disabled configured codex_apps singleton vetoes the whole Apps bundle for compatibility.

  11. Literal-loopback proxy/redirect hardening is generic. The no-proxy rule and literal-loopback redirect restriction live in the shared exec-server HTTP path, so their safety benefit and compatibility impact extend beyond Apps.

Known blockers and open decisions

These are prototype findings, not claims that the branch is landing-ready.

Priority Issue Required follow-up
P1 app-server status aggregation and standalone connector suggestions regress even though the boundary does not require either change. Preserve them at the product-aware owner or obtain explicit API/product approval.
P1 Runtime metadata is a broad capability surface with mostly one current product consumer. Justify, bound, and minimize every field; avoid replacing explicit product branches with unconstrained generic escape hatches.
P1 Semantically unchanged Apps projections allocate fresh pointer-identity owners and approval-persistence callbacks. Stabilize identities or compare semantic registration state; add the availability-only reconciliation test with Apps active. Compatible clients are reused today, but the manager itself can churn.
P1 Cold-start integration proves that the turn begins while inventory is blocked, but does not inspect the exact first model request. Assert that request has no Apps tools/instructions, then assert the later request does; replace real sleeps with deterministic barriers or paused time.
P1 Per-session upstream MCP connections increase fanout compared with the old singleton client. Measure/justify the isolation tradeoff and add lifecycle/concurrency coverage.
P2 Apps MCP naming helpers remain in connectors while the body previously claimed codex-apps owned names. Move routing helpers to the adapter or document the intended stable split.
P2 The proposed final cutover stage is still too large. Extract pure behavior, land generic primitives with non-Apps proofs, land the HTTP adapter, then keep production wiring/deletion small.

Architecture at a glance

Names and endpoints

Name Exact meaning
https://chatgpt.com/backend-api/ps/mcp Default production Hosted Apps endpoint. Other configured bases derive their corresponding /ps/mcp URL.
codex_apps before this PR Codex's local registration name for that one remote endpoint. It is not a second service.
codex_apps_upstream after this PR Internal client name used by the adapter when it connects to the remote endpoint.
codex_apps after this PR A local, resource-only compatibility MCP server.
codex_apps__<connector> after this PR A local ordinary HTTP MCP server exposing one connector's tools. The model namespace is mcp__codex_apps__<connector>.

Before and after

flowchart LR
  subgraph BEFORE["Before: one remote endpoint, one reserved local name"]
    direction TB
    B_CLIENT["Codex MCP client"]
    B_NAME["local registration: codex_apps"]
    B_REMOTE["actual remote HTTP MCP: /backend-api/ps/mcp"]
    B_TOOLS["one logical, potentially paginated tool inventory"]
    B_SPECIAL["codex-mcp + core decode connector metadata and synthesize namespaces"]
    B_CLIENT --> B_NAME --> B_REMOTE --> B_TOOLS --> B_SPECIAL
  end

  subgraph AFTER["After: the product adapter publishes ordinary MCP registrations"]
    direction TB
    A_REMOTE["same Hosted Apps upstream: /backend-api/ps/mcp"]
    A_OWNER["codex-apps + Apps extension: product-aware owner"]
    A_HTTP["one loopback listener per immutable generation"]
    A_ROUTES["HTTP routes: codex_apps resources; codex_apps__calendar tools; codex_apps__gmail tools"]
    A_GENERIC["generic catalog → connection manager → core"]
    A_REMOTE <-->|"inventory + forwarding"| A_OWNER
    A_OWNER --> A_HTTP --> A_ROUTES --> A_GENERIC
    A_OWNER -. "direct inventory + auth state" .-> A_APP["app-server"]
  end
Loading

There is one 127.0.0.1:0 listener per immutable generation, not one listener or process per connector. Each route becomes a separate EffectiveMcpServer registration with its own URL path, runtime-only bearer, policy, and trusted metadata.

Ownership boundary

flowchart TB
  subgraph PRODUCT["Product-aware zone"]
    direction LR
    APP["app-server: direct list, auth, and install APIs"]
    EXT["Apps extension: eligibility, lifecycle, policy, presentation, analytics"]
    APPS["codex-apps: inventory, cache, HTTP adapter, files, resources, auth"]
    SUPPORT["plugin + connectors: product inputs"]
    APP -->|"direct service"| EXT -->|"owns"| APPS
    SUPPORT -->|"declarations + policy"| EXT
    SUPPORT -->|"naming helpers"| APPS
  end

  subgraph GENERIC["Product-agnostic zone"]
    direction LR
    API["extension-api: revisioned effective servers"]
    CORE["core: safe-boundary orchestration"]
    MCP["codex-mcp: catalog + client reconciliation"]
    PATH["ordinary MCP tool, resource, and approval path"]
    API --> CORE --> MCP --> PATH
  end

  EXT -->|"server contributions"| API
  MCP -->|"Streamable HTTP calls to loopback routes"| APPS
Loading

mcp-server only constructs and shuts down an opaque host-extension bundle, then passes the registry to core. exec-server and rmcp-client provide generic loopback safety and bounded HTTP responses.

Exposed MCP topology

Registration tools/list Resource enumeration resources/read
codex_apps Empty Proxies the global upstream resources/templates Proxies the selected upstream resource
codex_apps__<connector> Only that connector's tools Empty Only URIs declared by that connector's tools

Cost and isolation tradeoff

  • The adapter adds one loopback HTTP hop and one listener per live generation.
  • Before this PR, Codex normally held one upstream MCP client connection to /ps/mcp. Afterward, inventory uses a shared discovery connection and every used downstream MCP session lazily opens its own upstream session. That increases connection fanout, but prevents an elicitation in one connector/session from blocking another.

Changed-module behavior map

This table covers every production module group in the diff. Test fixtures, manifests, generated build metadata, and Cargo.lock follow the owner they validate or register.

Module(s) Behavior after this change What moved or disappeared
codex-apps (new) Owns upstream /ps/mcp protocol adaptation, bounded raw inventory/cache, immutable generations, loopback route authentication, connector/resource proxying, file uploads, auth and standard elicitation bridges, approval presentation, cancellation, and shutdown. Product behavior leaves core and codex-mcp. It consumes connector naming helpers that still live in connectors; naming is not yet fully consolidated in this crate.
ext/mcp::apps Owns Apps eligibility/config, auth-revision lifecycle, cold-start retry and last-good publication, policy, approval persistence/reviewer choice, prompts, turn-item presentation, plugin attribution, analytics, and install verification. Replaces implicit core/manager behavior with an explicit product control plane that contributes generic servers.
ext/extension-api Adds generic Current versus Discover contribution modes, captured contributor revisions, runtime-effective server contributions, thread-data initialization, and composable install verification while reusing existing item hooks. Contains no Apps types or reserved names.
codex-mcp Owns EffectiveMcpServer, runtime-only bearer/owner/metadata, catalog precedence, generic elicitation, launch, status/resources, and per-client reconciliation. Deletes reserved codex_apps detection, connector parsing, Apps cache/hard refresh, Apps auth parsing, and Apps file/tool normalization.
core Resolves contributions at model safe boundaries, pins McpRuntimeSnapshot, and consumes generic trusted metadata for tool calls, approvals, Guardian, telemetry, resources, and presentation hooks. Deletes connector inventory, Apps instructions/files/templates/install checks, selected-connector session state, and server-name branches.
app-server Intentionally remains product-aware: one shared Apps extension supplies direct inventory/auth/install state; threadless status/resources prepare Apps and then use the ordinary catalog. Sourceful refresh installs the latest full thread config before rebuilding MCP. Stops reconstructing Apps by parsing generic MCP tools. The status wire shape currently changes and is an open compatibility decision.
connectors Retains Apps directory metadata, policy inputs, the codex_apps compatibility constant, and connector server/tool/title naming helpers. No longer supplies plugin app-config ownership. The remaining MCP-routing helpers should either move to codex-apps or be documented as a stable product split.
plugin Owns plugin-declared App configuration. app_config moves here from connectors; the old path is temporarily re-exported for compatibility.
core-plugins Produces plugin-only discovery candidates and selected-plugin attribution inputs. Stops mixing accessible connector inventory into generic discoverable tools.
chatgpt::connectors Retains the normal connector directory/cache/merge HTTP APIs. Deletes “accessible connectors from MCP tools”; accessible inventory now comes from the Apps service.
config Renames the generic approval enum to McpToolApproval while keeping AppToolApproval as an alias; retains generic discoverable-type parsing. Removes Apps-specific naming from ordinary MCP configuration without breaking the serialized config contract.
protocol Defines generic MCP metadata keys and McpToolSource; supports pinned item presentation and opaque elicitation IDs. Preserves the legacy serialized connector_* Guardian fields as a compatibility seam.
exec-server + rmcp-client Add generic literal-loopback proxy bypass/redirect confinement, exact environment-instance snapshots, and streaming HTTP response limits before JSON/SSE decoding. These protections are reusable transport behavior, not Apps branches.
login Adds race-free auth_with_revision() so credentials and revision-scoped runtime state cannot be paired across a refresh race. Apps lifecycle consumes the generic auth primitive.
analytics Builds Guardian MCP events from generic McpToolSource accessors while retaining compatibility fields. Connector-aware event construction leaves core call sites.
tools Makes generic install discovery plugin-only. Standalone connector suggestion/enable shapes are removed in this prototype; an Apps-owned replacement is still an open requirement.
ext/skills Continues to consume orchestrator resources through the compatibility codex_apps resource server. Has no connector lifecycle or policy ownership.
mcp-server Constructs, retains, and shuts down one opaque McpHostExtensions bundle and passes only its registry to core. Contains no direct codex-apps/codex-connectors dependency, reserved name, or product branch.
CLI Adapts MCP listing to the generic effective-server API; authentication remains part of status computation. API adaptation only; no intended user-visible change.
memories and thread-manager sample Remove assignments to core-owned Apps instruction/SKU fields that no longer exist. Compile/config cleanup; Apps remains disabled for memory workers.
utils/string Owns the stable short-hash helper used for deterministic collision names. Extracts compatibility naming logic from product-specific code.
.github boundary check Scans protected Rust/manifests and rejects named Apps/connector dependencies or branches, plus the selected in-process transport path. It is a regression guard, not a semantic proof over protocol schemas or generated content.

Build manifests register the new crate and dependency direction; the approval template asset moves from core to codex-apps unchanged.

Why these structural choices

Primary priority Decision Reason
1 Enforce the boundary in CI Dependency direction alone cannot catch a future reserved-name check or connector-shaped branch in otherwise generic source.
2 Publish immutable generations as EffectiveMcpServer registrations Refresh is an atomic swap; runtime-only bearers, owner guards, and trusted metadata participate in launch without entering TOML or serializable config.
3 Keep product policy and direct product APIs in the extension/app-server Product facts are translated once into narrow typed generic capabilities; the one behavior-bearing hook commits permanent approval through the product policy owner.
4 Use Streamable HTTP, with one route/registration per materialized connector namespace on one generation listener Preserves ordinary MCP execution, independent namespaces, and per-route credentials without an in-process special case or per-connector listener lifecycle.
4 Reconcile compatible clients instead of restarting the set Launch, auth, bound-environment, elicitation, and client-context compatibility determine reuse; unrelated stateful MCP clients survive Apps publication.
5 Separate Current and Discover, capturing contributor revision before resolving contributions Current projections do not initiate external Apps discovery or refresh, discovery stays non-blocking, and a publication race converges before a later model request.

Runtime flows

Cold start and discovery

sequenceDiagram
  participant Core as core safe boundary
  participant Catalog as core MCP resolver
  participant Apps as Apps extension
  participant Upstream as Hosted Apps /ps/mcp
  participant Manager as connection manager

  Core->>Catalog: resolve contributions (Discover)
  Catalog->>Apps: current snapshot + revision?
  Apps-->>Catalog: return immediately (possibly empty)
  Note over Core,Apps: Cold contribution does not await network discovery
  Apps->>Upstream: background inventory fetch
  Upstream-->>Apps: complete connector/tool inventory
  Apps->>Apps: validate + build immutable generation
  Apps->>Apps: publish registrations + increment revision
  Core->>Catalog: next safe-boundary resolve
  Catalog->>Apps: read published generation
  Apps-->>Catalog: resource server + connector servers
  Catalog->>Manager: resolved ordinary MCP catalog
  Manager->>Manager: reconcile and reuse compatible client connections
  Note over Apps,Manager: Failed refresh keeps the last-good generation published
Loading
Step-by-step behavior

At a model safe boundary, core samples contributor revisions and resolves contributions when the runtime inputs changed. The Apps extension returns its current publication immediately and starts one background initialization per connection key when no usable generation exists.

A failed cold initialization retries once immediately, then becomes eligible at a capped 1/2/4/8/16/30s cooldown. Cooldown timers perform no network work, and failed refresh preserves the last-good generation.

Connector tool call

sequenceDiagram
  participant Core as core generic tool path
  participant MCP as codex-mcp client
  participant App as connector loopback MCP
  participant Env as pinned environment
  participant Backend as ChatGPT backend

  Core->>MCP: call server codex_apps__calendar / tool create_event
  Note over Core,MCP: Generic policy and approval use trusted runtime metadata
  MCP->>App: HTTP MCP call + route-specific bearer
  App->>App: verify route, generation, Origin, and auth guard
  opt Schema declares file parameters
    App->>Env: read with pinned instance + sandbox context
    Env-->>App: bytes or fail closed
    App->>Backend: upload through file API
  end
  App->>App: routed name → stable upstream tool identity
  App->>Backend: /ps/mcp tools/call with sanitized metadata
  Backend-->>App: result or elicitation request
  opt Elicitation required
    App->>MCP: bridge to the initiating downstream session
    MCP-->>App: elicitation response
  end
  App-->>MCP: ordinary result + trusted effective input when rewritten
  MCP-->>Core: ordinary MCP result
  Note over App,Backend: Each downstream session lazily owns one upstream MCP session
Loading
Step-by-step behavior

The routed name resolves to the generation's stable upstream identity. Trusted runtime metadata supplies approval, telemetry, plugin attribution, and effective-input capabilities; all response and presentation handling then follows generic MCP/tool lifecycle hooks.

Refresh, auth change, and in-flight work

flowchart TB
  START["Published generation G1: listener P1, bearer B1, auth revision R1"]
  TRIGGER{"What changed?"}
  AUTH_EVENT["Re-evaluate auth and Apps eligibility"]
  BUILD["Build G2 off to the side: new listener P2 and new bearers"]
  REMOVE["If no longer eligible: publish removal contributions"]
  PUBLISH["Atomic Apps publication: G2 becomes available for contribution"]
  RECONCILE["Next safe boundary: catalog adopts the change and reconciles clients"]
  OLD["Pinned G1 remains alive while snapshot/runtime owners retain it"]
  KIND{"Did auth revision change?"}
  NORMAL["No: pinned G1 sessions remain valid until released"]
  AUTH["Yes: reject new G1 requests and recheck before upstream forward"]
  INFLIGHT["A call already forwarded upstream may finish"]
  DROP["Last G1 owner drops: cancel sessions and stop listener"]

  START --> TRIGGER
  TRIGGER -->|"inventory refresh"| BUILD
  TRIGGER -->|"login, logout, or token change"| AUTH_EVENT
  AUTH_EVENT -->|"still eligible"| BUILD
  AUTH_EVENT -->|"ineligible or logged out"| REMOVE --> RECONCILE
  BUILD --> PUBLISH --> RECONCILE
  START -. "old snapshots" .-> OLD --> KIND
  KIND -->|"no"| NORMAL --> DROP
  KIND -->|"yes"| AUTH --> INFLIGHT --> DROP
  AUTH --> DROP
Loading
Step-by-step behavior
  • Compatible live MCP clients are reused. Transport/bearer, auth revision, terminal client state, concrete environment, OAuth store/keyring, elicitation capabilities/reviewer metadata, or other client-context incompatibility can restart the affected client.
  • Compatible clients can be reused even when the manager is rebuilt. In the current prototype, fresh pointer-identity generation owners and approval-persistence callbacks can make semantically unchanged Apps registrations compare as changed; exact manager retention with Apps active remains an open fix and test gap.

Behavior audit

Detailed preserved behavior

Inventory, identity, and lifecycle

Behavior Result
Eligibility and feature gates The extension owns product/feature/orchestrator/legacy-veto and auth-keyed connectivity decisions. app-server retains workspace/thread gating for list APIs.
Connector namespaces Sanitized connector display names become codex_apps__<connector> MCP server names; stable connector IDs disambiguate collisions. Tools route through a per-generation raw-name map.
Name compatibility Natural names win; collisions use the existing deterministic identity hash; connector and tool identifiers remain UTF-8-safe and bounded to 64 bytes. Approval identity uses stable upstream identity, not the routed collision name.
Cache isolation Raw inventory is scoped by identity, upstream URL, product SKU, workspace/account state, and Codex home. Routing and policy are always re-derived instead of persisted.
Warm/cold behavior A warm generation is callable immediately while live refresh proceeds. Initialization is single-flight; failures preserve last-good state and use bounded retry.
Refresh consistency Each refresh publishes a new immutable generation atomically. Pinned old snapshots remain internally consistent until their work completes.
Shutdown Generation owners, HTTP sessions, upstream sessions, background initialization, pending elicitations, resource reads, and uploads all receive deterministic cancellation/cleanup.

Auth, approval, and trust

Behavior Result
Auth revision Connection identity includes auth revision. Stale observations cannot undo a newer logout/login decision. Old registrations reject new work without cancelling a call already forwarded upstream.
Apps auth elicitation The authenticated route supplies connector identity. An optional connector ID in the private errored-result envelope must match it. The local client derives the install URL; success refreshes and republishes the exact namespace set.
Standard MCP elicitations Form, URL, and openai/form requests bridge to the originating downstream session. Unsupported capabilities cancel rather than leak across sessions.
Tool policy Stable connector ID, upstream tool identity/title, destructive/open-world annotations, and requirements become ordinary enabled-tool and approval configuration before registration.
Approval identity Historical server identity + connector ID + upstream tool name remain the persistence key, so namespace collision churn cannot transfer an approval.
Approval UX Consequential-tool templates, labels, headers, Guardian source, reviewer constraints, and permanent-approval behavior move to Apps-owned presentation/policy modules.
Metadata trust Upstream generic approval context is stripped. Connected-account and effective-input metadata are honored only when recreated by the authenticated proxy and the runtime registration explicitly marks them trusted.
Persistence privacy Connected-account and private approval context are stripped on cache write and read and cannot enter model requests or rollout history.

Files, resources, plugins, and presentation

Behavior Result
File parameters openai/fileParams become local-path schemas. Reads use the exact pinned environment/sandbox state, validate type and buffered size, upload before forwarding, and preserve array constraints. Cancellation prevents partial argument rewriting/upstream invocation; completed uploads are not rolled back.
Effective tool input The proxy removes upstream-supplied effective-input metadata and emits its own only after a file rewrite. Core accepts it only with trusted registration metadata and the negotiated MCP capability.
Resources A resource-only codex_apps server proxies global resources/templates and exposes no tools. Connector routes can read only URIs declared by their tools and cannot enumerate the global set. The skills extension intentionally retains that resource-server compatibility name for orchestrator resources.
Plugin attribution Installed and thread-selected plugin connector declarations become generic display-name metadata on the contributed server. Selected attribution follows exact step environment availability.
Plugin install completion A plugin install succeeds only after base installation and every declared connector is present in the current Apps snapshot.
Turn-item presentation An in-progress call pins connector presentation, so refresh cannot change its completion metadata, action, link, or template halfway through the item.
Model instructions Apps instructions move to the extension, retain include_apps_instructions, require a policy-enabled tool in the published snapshot, and describe per-connector namespaces.
Observability Logical Hosted Apps origin replaces ephemeral loopback ports. Existing Apps server/tool telemetry identity and connector labels remain; proxy tools/list avoids duplicate physical-transport accounting.
App listing Pagination, workspace/thread feature gates, directory metadata, progressive notifications, warm-cache response, and live follow-up remain in app-server's direct service path.

Security invariants

  • Bind only to literal 127.0.0.1:0.
  • Generate an independent random 256-bit bearer for every connector and resource route in every generation.
  • Keep bearers runtime-only, redacted, and mutually exclusive with configured authorization.
  • Compare authorization in constant time and reject every request containing Origin, even with a valid bearer.
  • A bearer authorizes only its exact route and generation. Ordinary refresh keeps a pinned old route/credential alive while old snapshot/runtime owners remain; an auth-revision change rejects new requests through old routes.
  • Recheck the auth-generation access guard immediately before forwarding upstream.
  • Bypass ambient proxies only for literal IP loopback URLs; permit redirects only while every hop remains literal loopback, under the normal redirect cap. This policy lives in the shared exec-server HTTP path, not only the Apps adapter.
  • Bound complete inventory time, page count, tool count, serialized inventory/cache size, and each Hosted Apps upstream POST response. Enforce response bounds while streaming, before JSON/SSE decoding.
  • Pin environment instance, cwd, sandbox state, and permission profile for file reads.
  • Restrict connector resource reads to URIs declared by that connector's tools.
  • Strip private approval/account context before persistence and before model/rollout serialization.

Reviewer map

Smallest useful file-reading order
  1. .github/scripts/verify_codex_apps_mcp_boundary.py — the invariant being enforced.
  2. codex-rs/ext/extension-api/src/contributors/mcp.rs — the generic contribution contract.
  3. codex-rs/codex-mcp/src/server.rs, catalog.rs, runtime_metadata.rs — runtime registration, precedence, and trusted capabilities.
  4. codex-rs/apps/src/lib.rs, generation.rs, http.rs, connector_server.rs, resource_server.rs — immutable HTTP generations and protocol forwarding.
  5. codex-rs/ext/mcp/src/apps/ — eligibility, lifecycle, policy, presentation, analytics, and install verification.
  6. codex-rs/core/src/mcp.rs, session/mcp.rs, session/mcp_runtime.rs — generic projection and safe-boundary reconciliation.
  7. codex-rs/app-server/src/request_processors/apps_processor.rs, mcp_processor.rs, plugins.rs — the intentional product-aware direct service boundary.
  8. codex-rs/ext/mcp/src/lib.rs and codex-rs/mcp-server/src/message_processor.rs — opaque host composition.

Validation

The published prototype head 1043025e passed its GitHub checks on June 26, 2026. The branch is now behind main, and the local rebase is unfinished with conflicts, so this description does not claim that the rebased tree is green. The test strategy follows the risks rather than the file layout:

  • Boundary: the verifier checks protected-layer product dependencies/identifiers and explicitly rejects the named in-process transport path in codex-apps and its extension.
  • Transport and trust: route/generation bearer isolation, constant-time auth, Origin rejection, literal-loopback redirect/proxy rules, response/inventory/upload bounds, private metadata stripping, and cancellation.
  • Publication and lifecycle: warm cache, cold single flight, retry/backoff, last-good fallback, concurrent refresh, immutable/pinned generations, auth rekeying, publication races, stale-generation rejection, and shutdown.
  • Generic reconciliation: per-server restart on environment, auth, transport, or elicitation-runtime change and preservation of compatible live clients. Exact manager reuse is covered only without an active Apps extension; Apps-active semantic equality remains a known test/fix gap.
  • Core integration: discovery, connector tool calls, auth refresh, approval identity/context, Guardian review, files, resources, plugin attribution/install completion, analytics, presentation, and environment availability.
  • Host integration: app-server v2 Apps list/status/resource and runtime-OAuth rejection behavior, Apps auth elicitation, and a real mcp-server Hosted Apps end-to-end test.
  • Platforms: Linux, macOS, and Windows CI must be green on the rebased head before landing.

Revised proposed landing stack

Each stage should be independently green. A staged adapter may exist behind tests, but no stage should leave two production implementations active.

  1. Generic HTTP safety and bounds — loopback redirect/proxy handling and streaming response limits.
  2. Pure ownership extraction — move deterministic naming, raw cache, approval templates, and file-schema transforms to their intended product owner while legacy callers still use them.
  3. Runtime MCP registrationsEffectiveMcpServer, the smallest justified runtime bearer/owner/metadata surface, deterministic catalog semantics, and non-Apps proofs.
  4. Extension contribution and reconciliationCurrent/Discover, revisions, publication-race handling, safe-boundary projection, stable semantic identity, and non-Apps unchanged-client tests.
  5. Plugin ownership cleanup — move App declarations to the plugin crate and express selected-plugin attribution through generic contributions.
  6. Apps HTTP adapter behind integration tests — immutable generations, connector/resource routes, auth, files, elicitations, and upstream session isolation without production cutover.
  7. Apps extension and direct host APIs — lifecycle/policy/presentation/analytics/install ownership plus app-server direct inventory, still without switching the core execution path.
  8. Small atomic production cutover — register the ordinary HTTP servers, switch app-server/core consumers, and delete the legacy core/manager Apps path in the same commit.
  9. Opaque host boundary and enforcement — reduce mcp-server to generic host composition, enable the boundary verifier, and retain the real-host end-to-end proof.

The app-server status shape, standalone connector suggestions, cold-first-turn semantics, and other intentional behavior changes should land separately or be explicitly approved; they should not inflate the structural cutover.

@aibrahim-oai aibrahim-oai force-pushed the codex/apps-virtual-mcp-prototype branch from 11b6cb0 to d0a8930 Compare June 25, 2026 06:44
@aibrahim-oai aibrahim-oai changed the title [codex] Prototype Codex Apps as virtual MCP servers [codex] Prototype Codex Apps as virtual HTTP MCP servers Jun 25, 2026
@aibrahim-oai aibrahim-oai force-pushed the codex/apps-virtual-mcp-prototype branch 5 times, most recently from 06221a7 to b5b7ecc Compare June 26, 2026 01:52
@aibrahim-oai aibrahim-oai force-pushed the codex/apps-virtual-mcp-prototype branch from b5b7ecc to 42e8a8e Compare June 26, 2026 04:20
@aibrahim-oai aibrahim-oai force-pushed the codex/apps-virtual-mcp-prototype branch 2 times, most recently from 623bd99 to b1b3d96 Compare June 26, 2026 06:40
@aibrahim-oai aibrahim-oai force-pushed the codex/apps-virtual-mcp-prototype branch from b1b3d96 to 1043025 Compare June 26, 2026 06:56

Copy link
Copy Markdown

I would make the runtime-server move prove one boundary before landing: generic MCP layers should never need to know that a server came from Apps.

The risk in this shape is not the loopback adapter itself. It is a partial migration where Apps are exposed as ordinary MCP servers, but approval identity, file-argument rewriting, resource access, auth revision, and install eligibility still have hidden connector-shaped branches in core or the connection manager. That would preserve the old coupling under a new transport name.

Useful regression shape:

  1. Register an Apps-backed runtime MCP server and a configured MCP server with the same logical namespace, then assert precedence and disabled-server veto are decided by generic catalog rules only.
  2. Trigger a consequential tool call through the generated server and assert the approval key uses stable upstream server and tool identity, not routed display name or collision suffix.
  3. Exercise file parameters through a pinned environment generation and fail closed on stale generation, malformed arrays, size limit breach, or unauthorized path before upstream invocation.
  4. Refresh Apps during an in-flight call and prove old bearer/generation can finish only the already-forwarded work while new work uses the new generation.
  5. Verify app-server can read product inventory directly, while core, codex-mcp, and mcp-server pass the boundary verifier with no Apps or connector branches.

That keeps the architecture honest: Apps can own product policy and auth, but the model/runtime boundary should consume only ordinary MCP registrations plus trusted generic metadata.

Boundary: architecture and regression-test feedback only; no claim about running this branch or validating implementation behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants